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Abstract 

We consider a (random permutation model) binary search tree with n nodes and 
give asymptotics on the log log scale for the height H n and saturation level h n of 
the tree as n — ¥ oo, both almost surely and in probability. We then consider the 
number F n of particles at level H n at time n, and show that F n is unbounded 
almost surely. 

This is a work in progress — we hope to give further results on the asymp- 
totics of F n . 

1 Introduction and main results 

Consider the complete rooted binary tree T. We construct a sequence T„, 
n = 1, 2. ... of subtrees of T recursively as follows. Ti consists only of the root. 
Given T n , we choose a leaf u uniformly at random from the set of all leaves of 
T n and add its two children to the tree to create T n+1 . Thus T n+ i consists of 
T n and the children ul, u2 of u, and contains in total 2n + 1 nodes, including 
Ti+l leaves. We call this sequence of trees (1„)„>i the binary search tree. 



Figure 1: An example of the beginning of a binary search tree: at each stage, 
we choose uniformly at random from amongst the available leaves and add the 
children of the chosen leaf to the tree. 

This model has various equivalent descriptions: for example one may con- 
struct T n by successive insertions into T of a uniform random permutation of 
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{1, . . . , ra}. For a more detailed explanation of this and other constructions see 
Reed 0. 

One interesting quantity in this model is the height H n of the tree T„ - 
that is, the greatest generation amongst all nodes of T„ (where the root is 
defined to have generation 0); so Hi = 0, H 2 = 1, H 3 = 2, and H4 is either 
2 (with probability 1/3) or 3 (with probability 2/3). Another is the saturation 
level h n , defined to be the greatest complete generation of T„ — that is, the 
greatest generation k such that all nodes of generation k are present in T„ (so 
hi = 0, h 2 = 1, h 3 = 1 and is 1 with probability 2/3 and 2 with probability 
1/3). These two quantities, H n and h ni have been studied extensively. Pittel 
[5] showed that there exist constants c and 7 such that H n /\ogn — > c and 
h n / log n — > 7 almost surely, and gave bounds on the values of c and 7. Devroye 
4 calculated c exactly by showing that H n /\ogn — > c in probability as n — > 
00; and Reed [9] showed that for the same c and another known constant d, 
E[H n ) — clogn — rfloglogn + 0(1). Drmota [5] and Reed [9] also showed that 
Var^ = 0(1). 

Our first aim in this article is to prove the following theorem. 
Theorem 1. Let a be the solution to 

2(<z- l)e a + 1 = 0, a>0 

and let 

b := 2ae a 

(we get a « 0.76804 and b 3.31107J. Then 

1 b log n — aH n , b log n — aH n 3 

- = hm mi — - — < hm sup — - — = - 

2 rwoo log log n rn-oo log log n 2 
almost surely and 

b log n — aH n p 3 

> - as n — > 00. 

log log n 2 

Of course, a and b agree with the constants c and d mentioned above in the 
sense that c = b/a and d = 3/2a. By the same methods, we obtain a similar 
theorem concerning h n . 

Theorem 2. Let a be the solution to 

2(a + l)e- a - 1 = 0, a>0 

and let 

(3 := 2ae- a 
(we get a w 1.6783 and (3 w 0.6266;. Then 

1 ,. . r cth n — /3 log ?i ah n — f3 log n 3 

- = hm ml — - — < hm sup — - — = - 

2 n->oo log log n n^oo log log n 2 

almost surely and 

ah n — ft log n p 3 

— > - as n — > 00. 

log log n 2 
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This shows in particular that the lower bound given by Pittel [5] is the correct 
growth rate for the saturation level h n on the log scale. 

Other aspects of the binary search tree model also give interesting results. 
The article by Chauvin et al. [3J, for example, tracks the number of leaves at 
certain levels of the tree, called the profile of the tree, via convergence theorems 
for polynomial martingales associated with the system. 

We are also interested in how many leaves are present at level H n of the tree 
at time n. We call the set of particles at this level the fringe of the tree, and 
call the size of the fringe F n , so that Fi = 1, F 2 = 2, F 3 = 2, and F4 is 2 with 
probability 2/3 or 4 with probability 1/3. Note that the word "fringe" has been 
used also in a different context by, for example, Drmota et al. [5]. Trivially 
F n £ {2, 4, 6, . . .} for all n > 2, and (given that H n — > 00 almost surely, which is 
a simple consequence of Theorem [l} lim inf n _>oo F n — 2 almost surely. We are 
able to prove the following preliminary result. 

Proposition 3. We have 

lim sup F n = 00 

almost surely. 

Further work on the behaviour of F n in the limit as n — > 00 is underway. 

Our main tool throughout is the relationship between binary search trees and 
an extremely simple continuous time branching random walk, called the Yule 
tree. This relationship is well-known — see Aldous & Shields [T] and Chauvin 
et al. 3J . The hard work required for Theorem [T] is then done for us by a 
remarkable result of Hu & Shi [7] . We introduce the Yule tree model in Section 
[2] before proving Theorems [T] and [2] in Section [3j Finally we study F n , and in 
particular prove Proposition [3j in Section [4] 

2 The Yule tree 

Consider a branching random walk in continuous time with branching rate 1, 
starting with one particle at the origin, in which if a particle with position x 
branches it is replaced by two children with position x — 1. That is: 

• We begin with one particle at 0; 

• All particles act independently; 

• Each particle lives for a random amount of time, exponentially distributed 
with parameter 1; 

• Each particle has a position x which does not change throughout its life- 
time; 

• At its time of death, a particle with position x is replaced by two offspring 
with position x — 1. 

We call this process a Yule tree. Let N(i) be the set of particles alive at time t, 
and for a particle u £ N(t) define X u (t) to be the position of u at time t. Let 
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M{t) denote the smallest of these positions at time t, and S(t) the largest - 
that is, 

M(t) := m£{X u (t) : u € N(t)} 

and 

S(t) := sup{X u (t) : u € N(t)}. 

We note that if we look at the Yule tree model only at integer times, then 
we have a discrete-time branching random walk. On the other hand, we have 
the following simple relationship between the Yule tree process and the binary 
search tree process. 

Lemma 4. Let T\ = and for n > 2 define 

T n := inf{t > T„_! : N(t) + N{T n ^)} 

so that the times T n are the birth times of the branching random walk. Then 
we may construct the Yule tree process and the binary search tree process on the 
same probability space, such that 

—M(T n ) = H n Vn > 1 

and 

-S(T n ) = h n + 1 Vn > 1 

almost surely. 

Proof. By the memoryless property of the exponential distribution, at any time 
t the probability that a particular particle u € N(t) will be the next to branch 
is exactly l/#iV(i). Thus, if we consider the sequence of genealogical trees 
produced by the Yule tree process at the times Tj , j > 1, we have exactly 
the binary search tree process — particles in N(t) correspond to leaves in the 
binary search tree. Clearly the position of a particle in the Yule tree process 
is -1 times its height in the genealogical tree, so we may build the Yule tree 
process and binary search tree process on the same probability space and then 
-M(T n ) = H n and -S(T n ) = h n + 1 for all n > 1 (almost surely). □ 

We would like to study (H n ,n > 1) via knowledge of (M(t),t > 0), and 
similarly for h n and S(t), and hence it will be important to have control over 
the times T n . It is well-known that T n is close to logn. We give a simple 
martingale proof, as seen in Athreya & Ney (2]. 

Lemma 5. There exists an almost surely finite random variable £ such that 

T n — logn — > C almost surely as n-> oo; 
and hence for any 5 > we may choose K € N such that 

P (limsup |T„ - [log raj | > K ) < 8 

and 

limsupP(|r n - LlograJ| > K) < S. 
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Remark. One may in fact show that ( is exponentially distributed with pa- 
rameter 1. 



Proof. For each n > 1, let V n := n(T(n)—T(n — l)). Then the random variables 
V n , n > 1 arc independent and exponentially distributed with parameter 1. 
Define 

Then X n is clearly a zero-mean martingale; and 



< 00 



3 = 1 



so by the martingale convergence theorem X n converges almost surely (and in 
L 2 ) to some almost surely finite limit X . But it is well-known that 

n 

E-T 1 - logn 
i=i 

converges to some finite, deterministic constant. This is enough to complete the 
proof of the first statement in the Lemma, and the next part is trivial: since 
£ is almost surely finite, we may choose K such that P(|CI > K) < 6. For the 
final part, we may either use Fatou's lemma: 



limsupP(|T„-logn| > K + 1) < E 

n—too 



limsup l{|T n -logn|>if+l} 

n— >oo 

< ¥ ( limsup \T n - logn| > K] ; 



or, for a more elementary proof, apply Chebyshev's inequality to the martingale 
X„ : 



3=1 



>K \ =¥{\X n \>K)< 



K 2 ' 



□ 



We mentioned above that, if we look at the Yule tree only at integer times, 
we see a discrete-time branching random walk. Since discrete-time branching 
random walks are more widely studied than their continuous-time counterparts 
(in particular the theorem that we would like to apply is stated only in discrete- 
time), it will be helpful to know the branching distribution of the discrete model. 
This is a standard calculation. 

Lemma 6. We have 



Proof. Let 



E 



E ■ 



-9X U (1) 



= exp(2e e - 1). 



E e (t)=E 



E - 

•ieJV(t) 



-9X u {t) 
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and for s, t > and a particle u £ N(t) define N u (t; s) to be the set of descen- 
dants of particle u alive at time t + s: that is, N u (t; s) :— {v £ N(t + s) : u < v}. 
Then by the Markov property, 



Eg(t + s)=E 



E 



E 



-ex u (t+s) 



ueN(t+s) 



E z- exAt) E 



-e(x„(t+«)-x„(t)) 



E 



E 



J2 e- 0X ^E 



E 



-0(X„(t+s)-X v (t)) 



J2 e- 6x ^E e {s 

= Eg(t)Eg{s). 

We deduce that for s, t > 0, 

Eg{t + s) - Eg(t) 



Eg(t) 



Eg(s) - 1 



E e (t - s) - E e (t) s fE e (a)-l 



and 



It is easily checked that Eg(t) is continuous in t, and hence if E' g (0+) exists then 
by the above we have that Eg(t) is continuously differentiable and for all t > 

E'g(t) = Eg(t)E'g(0 + ). 

Since Eg(0) — 1 this entails that 

E e (t) = exp{E' g (0+)t). 

Now, for small t, 

E e (t) = P(first split after t) + 2e e P(first split before t) + o(t) 
= l-t + 2te e + o(t) 

so that E'(0+) = 2e e - 1, and hence Eg(t) = exp((2e e - l)t). Taking t = 1 
completes the proof. □ 

These simple properties of the Yule tree will allow us to prove our main 
theorem. 



3 Proof of Theorems [U and [2] 

We would like to apply the following theorem of Hu and Shi [7 . This result was 
proved for a large class of branching random walks; our particular simple case 
(when recentred) trivially satisfies the assumptions in [7] , and so we omit those 
assumptions here. 
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Theorem 7 (Hu, Shi [7 ). Defir, 



If 9* satisfies 



tp(6) :=E 



then 



and 



= lim inf 



*M(n)+nlogil)(6 
logn 



ueiv(i) 



iog^(r 



9* > 0, 



< lim sup 



9*M(n) +n log ip(6*) _ 3 
logn 2 



0*M(n) + n log ip{9*) v 3 

> - as n — > oo. 

log n 2 

In view of this result, our method of proof for Theorems [I] and [2] is unsurpris- 
ing: we know that the times T n are near logn for large n, and we may use the 
monotonicity of H n and h n — together with the flexibility offered by the log log 
scale — to ensure that nothing else can go wrong. It may be possible to extend 
this method of proof to cover more general trees, where the same monotonicity 
property does not necessarily hold, via a Borel-Cantelli argument. This would 
only introduce unneccessary complications in our case. 

Proof of Theorem^ We show first the statement involving the limsup; the 
proofs of the other statements are almost identical. 

It is immediate from Lemma [6] that a in Theorem [T] corresponds to 9* in 
Theorem[7j and that b corresponds to log ip(9*). Fix 5 > 0. Choose K € N such 
that 

P(limsup |T„ - [lognj | > K) < S 

- this is possible by Lemma [5j For each n > 1, let j n = [log^J ~ K. We 
use the abbreviation "i.o." to mean "infinitely often" — that is, for a sequence 
of measurable sets U n , {U n i.o.} represents the event limsup„_j. 00 U n . For any 
e > 0, using the fact that M(t) is non-increasing, 



W(aM(T n ) + b\ogn> (3/2 + e) log logn i.o.) 
< V({aM (T n )+b log n > (3/2 + e) log log n, \T n 
+ V(\Tn- LlognJ| >K i.o.) 



[lognJI < K} i.o.) 



< 



aM(j n ) > -bj n + (3/2 + e) log j n + (bj n - b logn) 



+ (3/2 + e) (log logn - log j n ) i.o. + 5 



< P(aM(j„) > -bj n + (3/2 + e/2) log j„ i.o.) + 6 

< s 

by Theorem [7j Taking a union over e > tells us that 

aM(T n ) + 6 log n 3\ 

limsup — > - < 5; 

n-yoo log log n 2 J 
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but since S > was arbitrary we deduce that 

aM(T n ) + 6 log n 3\ 

hmsup : > - = 0. 

n^oo log log n 2 J 

This completes the proof of the upper bound, since H n = —M(T n ). The proof 
of the lower bound is similar. We let i n — [log n\ + K and use the abbreviation 
"ev." to mean "eventually" (that is, for all large n; so {U n ev.} represents the 
event liminf„_ i . 00 U n ). For any e € (0,3/2), 

F(aM(T n ) +b\ogn < (3/2 - e) log log n ev.) 

<P({aM{T n ) + b\ogn < (3/2 - e) log log n, |T„ - [log raj | < X} ev.) 
+ P(|T„ - LlognJI > K i.o.) 

< PUM(i n ) < -ta„ + (3/2 - e) logi n + (W„ - fclogn) 

+ (3/2 - s) (log log n - logi„) ev.^j + (5 

<V(aM(i n ) < -bi n + (3/2 -e/2) logi n ev.) + S 

< S 

by Theorem [7J As with the upper bound, taking a union over e > 0, and then 
letting S — > 0, tells us that 

,. aM(r K )+blogn 3\ 

hm sup < - = 

„^oo log log n 2 J 

and hence combining with the upper bound we obtain 

b log n — aH n 3 

hmsup — - — = - 

n->oo log log n 2 

almost surely. The proof of the statement involving the liminf is almost identical, 
and we omit it for the sake of brevity. The convergence in probability is also 
similar: one considers for example that 

lira sup P (aM(T n ) + b log n > (3/2 + e) log log n) 

n—¥oo 

< lim sup P (aM (T„) + b log n > (3/2 + e) log log n, \T n - [log n\ \ < K) 

n— >oo 

+ lim sup P (\T n - [\ogn\ \ > K) 

n—too 

< lim sup P (aM (T n ) + b log n > (3/2 + e) log log n, \T n - [log n\ \ < K) + 5 

n— J-oo 

and uses the statement about convergence in probability in Theorem [7] to show 
that the probability in the last line above converges to zero for any e > 0. Then 
since S > was arbitrary we must have 

lim sup P (aM (T n ) + b log n > (3/2 + e) log log n) = 0. 

n— >oo 

The lower bound is, again, similar. □ 
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Proof of Theorem^ Consider a slightly altered Yule tree model, where each 
particle gives birth to two children whose position is that of their parent plus 1, 
instead of minus 1. If we couple this model with the usual Yule tree model in 
the obvious way, then clearly the minimal position of a particle in the altered 
model is equal to —1 times the maximal position in the usual model. Thus if we 
let M{t) be the minimal position in the altered model, it suffices to show that 

1 .. . t aM(T n )- p log n .. aM(T n ) - p log n 3 
- = hm mi < hm sup = - 

2 n->oc log log n n-Kx> log log 71 2 



as n — > oo. 



and 

aM{T n )~ p log n ^ 3 
log log n 2 

Lemma [(i] (substituting 8 := —8, say) tells us that for the altered model, 
a in Theorem [2] corresponds to 8* in Theorem [7J and that —ft corresponds to 
\ogip{9*). The rest of the proof proceeds exactly as in the proof of Theorem 

m □ 



4 The size of the fringe, F n 

We are now interested in the size of the fringe of the tree: how many leaves lie 
at level H n at time n. Recall that we called this quantity F n . 




Figure 2: The top three levels of a binary search tree run for 10 9 steps. The 
thick blue line shows the size of the fringe, F n , which is the number of leaves 
at level H n ; the thin red line shows the number of leaves at level H n — 1; and 
the dashed green line shows the number of leaves at level H n — 2. 



We will show that F n is unbounded almost surely, but first we need a short 
lemma. For this lemma we consider again the Yule tree model, and call the set 



9 



of particles with position M(t) the frontier of the Yule tree at time t — recall 
that this is the set of particles with minimal position at time t, so as we saw 
earlier the frontier of the Yule tree corresponds to the fringe of the binary search 
tree. Define F t to be the number of particles at the frontier at time t, 

F t := #{u G N{t) : X u {t) = M(t)}. 

Lemma 8. If M(t) < — |log 2 (2fc)J and F t — 2k, then there is at least one parti- 
cle that is not at the frontier at time t, but which is within distance [log 2 (2fc)J of 
the frontier — that is, its position is in the interval [M(t)+1, M(t) + |log 2 (2fc)J] . 

Proof. Clearly at some time before t there was a particle which had position 
M{t) + [log 2 (2fc)J ; and hence at some time there were at least 2 particles with 
this position, since particles (except the root) arrive in pairs. At time t, either 
these particles have at least one descendant not at the frontier, in which case 
we are done (as particles cannot move in the positive direction); or all their 
descendants are at the frontier. So, for a contradiction, suppose that all their 
descendants are at the frontier at time t. Then there must be 2 x 2 L lo S2 ( 2fc >J 
particles at the frontier (since a movement of distance 1 yields 2 new particles, 
and hence a movement of distance Ll°g2 (2^)J yields 2 L lo s 2 ( 2fe )J new particles; 
and this holds for each of the two initial particles). But 

2 x 2 Llog 2( 2fc )J = 2L lo S2(2fe)J+i > 2 lo S2(2fc) _ 2k 

so there are strictly more than 2k particles at the frontier. This is a contradiction 
- there are exactly 2k particles at the frontier, by assumption — and hence 
our claim holds. □ 

We now prove Proposition [3] which we recall says that limsup^^.^ F n = oo 
almost surely. 

Proof of Proposition^ Again consider the continuous time Yule tree. By the 
relationship between the Yule tree and the binary search tree seen in Section [2] 
F t and F n have the same paths up to a time change, and hence it suffices to 
show that lim sup F t = oo almost surely. 

The idea is as follows: suppose we have 2k particles at the frontier. By 
Lemma[8j there is a particle close to the frontier; and this particle has probability 
greater than some strictly positive constant of having 2 of its descendants make 
it to the frontier before the 2k already there branch. So if we have 2k particles 
infinitely often, then we have 2k + 2 particles infinitely often. We make this 
argument rigorous below. 

For any t > and k € N, define 

rf fe) := inf{s > : M(s) < -Llog 2 (2fc)J and F s = 2k} 
and for each j >1 

af k) := inf{ S > rj 2fe) : F s ? 2k} 

and 

Tg? := inf{ S > of fc > : F s = 2k}. 
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Then r,- is the jth time that we have 2k particles at the frontier and at least 
distance log 2 (2fc) from the origin. We show, by induction on k, that for any 

keN 

Tj ' < oo almost surely, for all j € N. (1) 

Trivially, since M{t) — > — oo almost surely (which is true since H n — > oo almost 
surely), we have rj 2 "* < oo almost surely for all j 6 N and so (jlj holds for k = 1. 
Suppose now (TH) holds for some k > 1. 

By Lemmal8 for any j, at time T J - 2fc ' > there is at least 1 particle that is not 
at the frontier but is within distance [log 2 (2/c)J of the frontier. Let Ay' be the 
event that the descendants of this particle reach level M(rj 2fc ' 1 ) before any of the 

2k particles already at that level branch. Then the events A\ ,A 2 ,A 3 , . . . 
are independent by the strong Markov property. Also, since all particles branch 
at rate 1, for each j the probability of Aj is certainly at least the probability 
that the sum of |log 2 (2fc)J independent, rate 1 exponential random variables is 
less that the minimum of 2k independent, rate 1 exponential random variables. 
This is some strictly positive number, 7/. say. 

Now, at time T J - 2fe ' ) — which is finite for each j, by our induction hypothesis 
— there are 2k particles at the frontier. One of two things can happen: either 
two more particles join them and we reach 2k + 2 particles at the frontier, or 
one of the 2k branches before this happens and we have a new frontier with 2 
particles. Call the first event, that two more particles reach the frontier before 
any of the 2k already there branch, bV°\ Then A^y 1 C since the event 
that some pair makes it to the frontier before the 2k branch contains the event 
that descendants of our particular particle make it to the frontier before the 2k 
branch. Thus 




But the event limsup m _ J . 00 Bin' is exactly the event that we have 2k + 2 particles 
at the frontier infinitely often — and thus (using again that M(t) — > —00 almost 
surely) we have that 7 -^ fe+2 ) j s finite almost surely for all j. Hence by induction 
we have proved that (fil holds for each k. Our result follows. □ 
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